-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add allow_partial
#1512
Add allow_partial
#1512
Conversation
5af9b88
to
554ec40
Compare
CodSpeed Performance ReportMerging #1512 will degrade performances by 16.62%Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly minimal changes here given the new functionality added - cool!
I'm a bit confused about the naming around enumerate_last_partial
- perhaps we could bikeshed just a bit here to make the purpose of this function more clear?
from pydantic_core import SchemaValidator, ValidationError, core_schema | ||
|
||
|
||
def test_list(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just parametrize here and have a test for validation fails and validation successes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same questions below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, tests lgtm
Curious to hear what @davidhewitt thinks about the way this is implemented - overall, looks good to me, I think having things centralized around the It's interesting to me the ways in which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall functionally looks like it's heading in the right direction, but I think there can be some simplifications. I also have a bunch of questions which hint at edge cases.
src/input/input_python.rs
Outdated
Self::Dict(dict) => dict.keys().iter().last(), | ||
Self::Mapping(mapping) => mapping.keys().ok()?.iter().ok()?.last()?.ok(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB both of these are potentially inefficient; .keys()
creates a new PyList
of all the keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, I don't love it, but at least it's only called when allow_partial=true
.
Is there a more efficient way? (especially for dicts?)
Long term, if we move to RustModel
and iterating over dicts when building typeddicts, we should be able to minimise use of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.call_method0("keys").iter().last()
might be better (Python keys iterator rather than list), but I think it still sucks. Can't get better until we iterate, as you say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested by adding:
#[pyfunction]
pub fn mapping_last_key_a<'a>(mapping: &'a Bound<'a, PyMapping>) -> Option<Bound<'a, PyAny>> {
mapping.keys().ok()?.iter().ok()?.last()?.ok()
}
#[pyfunction]
pub fn mapping_last_key_b<'a>(mapping: &'a Bound<'a, PyMapping>) -> Option<Bound<'a, PyAny>> {
mapping.call_method0(intern!(mapping.py(), "keys")).ok()?.iter().ok()?.last()?.ok()
}
and tested with
from typing import Mapping
import timeit
from pydantic_core import _pydantic_core
class MyMapping(Mapping):
def __init__(self, d):
self._d = d
def __getitem__(self, key):
return self._d[key]
def __iter__(self):
return iter(self._d)
def __len__(self):
return len(self._d)
mapping = MyMapping({str(i): i for i in range(100)})
v = _pydantic_core.mapping_last_key_a(mapping)
assert v == '99', v
v = _pydantic_core.mapping_last_key_b(mapping)
assert v == '99', v
def run_bench(func):
timer = timeit.Timer(
"func(mapping)", setup="", globals={"func": func, "mapping": mapping}
)
n, t = timer.autorange()
iter_time = t / n
# print(f'{func.__module__}.{func.__name__}', iter_time)
return int(iter_time * 1_000_000_000)
print(f'mapping_last_key_a: {run_bench(_pydantic_core.mapping_last_key_a)}ns')
print(f'mapping_last_key_b: {run_bench(_pydantic_core.mapping_last_key_b)}ns')
Output:
mapping_last_key_a: 2487ns
mapping_last_key_b: 1918ns
Outcome: they're both pretty slow, but your suggestion is slightly faster.
src/validators/validation_state.rs
Outdated
pub fn enumerate_last_partial<'i, I>( | ||
&self, | ||
iter: impl Iterator<Item = I> + 'i, | ||
) -> Box<dyn Iterator<Item = (usize, bool, I)> + 'i> { | ||
if self.allow_partial { | ||
Box::new(EnumerateLastPartial::new(iter)) | ||
} else { | ||
Box::new(iter.enumerate().map(|(i, x)| (i, false, x))) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than forcing dynamic dispatch here, it might be better to use .peekable()
at the callsites to be able to check if the current iteration was the last as part of the error pathway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peekable
made things worse, but I got rid of the Box dyn
in 47f1c15
and it helped quite a lot locally - comparing main to 47f1c15:
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Group ┃ Benchmark ┃ Before (µs/iter) ┃ After (µs/iter) ┃ Change ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ List │ list_of_ints_core_py │ 29.17 │ 30.09 │ +3.17% │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│ List JSON │ list_of_ints_core_json │ 42.49 │ 43.57 │ +2.55% │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│ Set │ set_of_ints_core │ 52.26 │ 52.81 │ +1.06% │
│ │ set_of_ints_core_duplicates │ 35.59 │ 35.51 │ -0.22% │
│ │ set_of_ints_core_length │ 53.59 │ 55.16 │ +2.94% │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│ Set JSON │ set_of_ints_core_json │ 56.31 │ 58.40 │ +3.70% │
│ │ set_of_ints_core_json_duplicates │ 44.14 │ 45.07 │ +2.12% │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│ FrozenSet │ frozenset_of_ints_core │ 16.60 │ 17.26 │ +3.99% │
│ │ frozenset_of_ints_duplicates_core │ 13.78 │ 14.78 │ +7.20% │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│ Dict │ dict_of_ints_core │ 85.74 │ 86.36 │ +0.73% │
├─────────────┼─────────────────────────────────────┼────────────────────┼───────────────────┼──────────┤
│ Dict JSON │ dict_of_ints_core_json │ 127.4 │ 126.5 │ -0.72% │
└─────────────┴─────────────────────────────────────┴────────────────────┴───────────────────┴──────────┘
|
||
|
||
def test_tuple_list(): | ||
"""Tuples don't support partial, so behaviour should be disabled.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? At least for variadic tuples this seems potentially acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely want to support it in future, I was just implementing it for the minimum set of validators to prove the idea initially.
src/input/return_enums.rs
Outdated
for (index, item_result) in iter.enumerate() { | ||
|
||
for (index, is_last_partial, item_result) in state.enumerate_last_partial(iter) { | ||
state.allow_partial = is_last_partial && validator.supports_partial(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the .supports_partial()
guard necessary (here and in general)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because if you enabled allow_partial
for a validator which doesn't have proper support for the feature, you would end up skipping errors in the wrong places.
E.g. if you had list[tuple[list[int], list[str]]
:
- the outer list validator supports
allow_partial
, so it'll ignore errors in the last entry in the input list - but if it passed
allow_partial=true
down to the tuple validator (which doesn't current support) allow_partial - it would pass
state
through to both inner list validators unchanged - then the list validator associated with the 0th entry in the tuple would get
allow_partial=true
- and therefore errors in the last entry of the 0th member of the tuple in the last entry of the outer list would be ignored when they shouldn't be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me like the "doesn't support partial" is the exception rather than the rule, and probably only for non-variadic tuples? Otherwise, I would have thought that:
- collections would always want to respect
allow_partial
when passed in (forwarding it to their last element) - everything else e.g.
with_default
etc should not even care aboutallow_partial
and just forward it naively
Err(ValError::LineErrors(line_errors)) => { | ||
for err in line_errors { | ||
errors.push(err.with_outer_location(key.clone())); | ||
if !is_last_partial { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'm feeling very baited to refactor these error combiners now 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #1517 as a spike to start down that road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely want to support it in future, I was just implementing it for the minimum set of validators to prove the idea initially.
I think we need it to be approximately right for most validators to actually be worth inviting users to try, and I'm not sure it is at the moment.
src/input/input_python.rs
Outdated
Self::Dict(dict) => dict.keys().iter().last(), | ||
Self::Mapping(mapping) => mapping.keys().ok()?.iter().ok()?.last()?.ok(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.call_method0("keys").iter().last()
might be better (Python keys iterator rather than list), but I think it still sucks. Can't get better until we iterate, as you say.
src/input/return_enums.rs
Outdated
for (index, item_result) in iter.enumerate() { | ||
|
||
for (index, is_last_partial, item_result) in state.enumerate_last_partial(iter) { | ||
state.allow_partial = is_last_partial && validator.supports_partial(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me like the "doesn't support partial" is the exception rather than the rule, and probably only for non-variadic tuples? Otherwise, I would have thought that:
- collections would always want to respect
allow_partial
when passed in (forwarding it to their last element) - everything else e.g.
with_default
etc should not even care aboutallow_partial
and just forward it naively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there's very common use cases missing from here:
- unions
- nullable
- defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think validators fit into three cases:
- already done
- doesn't apply - things like
IntValidator
whereallow_partial
doesn't mean anything - TODO simple cases - things like nullable where AFAIK it's just a matter of passing the
allow_partial
instruction to its nested validator - that's my understanding of these but maybe it's more complicated - TODO - collections or multiple nested validators where we need custom logic
I think validators break down like this:
Validator | Status |
---|---|
TypedDictValidator |
DONE |
UnionValidator |
TODO simple case? |
TaggedUnionValidator |
TODO simple case |
NullableValidator |
TODO simple case |
ModelValidator |
doesn't apply |
ModelFieldsValidator |
TODO |
DataclassArgsValidator |
TODO |
DataclassValidator |
doesn't apply |
StrValidator |
doesn't apply |
StrConstrainedValidator |
doesn't apply |
IntValidator |
doesn't apply |
ConstrainedIntValidator |
doesn't apply |
BoolValidator |
doesn't apply |
FloatValidator |
doesn't apply |
ConstrainedFloatValidator |
doesn't apply |
DecimalValidator |
doesn't apply |
ListValidator |
DONE |
SetValidator |
DONE |
TupleValidator |
TODO |
DictValidator |
DONE |
NoneValidator |
doesn't apply |
FunctionBeforeValidator |
TODO simple case? |
FunctionAfterValidator |
TODO simple case? |
FunctionPlainValidator |
TODO simple case? |
FunctionWrapValidator |
TODO simple case? |
CallValidator |
TODO simple case? |
LiteralValidator |
doesn't apply |
IntEnumValidator |
doesn't apply |
StrEnumValidator |
doesn't apply |
FloatEnumValidator |
doesn't apply |
PlainEnumValidator |
doesn't apply |
AnyValidator |
doesn't apply |
BytesValidator |
doesn't apply |
BytesConstrainedValidator |
doesn't apply |
DateValidator |
doesn't apply |
TimeValidator |
doesn't apply |
DateTimeValidator |
doesn't apply |
FrozenSetValidator |
DONE |
TimeDeltaValidator |
doesn't apply |
IsInstanceValidator |
doesn't apply |
IsSubclassValidator |
doesn't apply |
CallableValidator |
doesn't apply |
ArgumentsValidator |
TODO |
WithDefaultValidator |
TODO simple case |
ChainValidator |
TODO simple case? |
LaxOrStrictValidator |
TODO simple case? |
GeneratorValidator |
TODO |
CustomErrorValidator |
TODO simple case? |
JsonValidator |
TODO |
UrlValidator |
doesn't apply |
MultiHostUrlValidator |
doesn't apply |
UuidValidator |
doesn't apply |
DefinitionRefValidator |
TODO |
JsonOrPython |
TODO simple case? |
ComplexValidator |
doesn't apply |
If that table is correct, we should set the default for supports_partial
to true
, then set it to false specifically for fields which don't support it. But I'm still not that confident I'm write about all those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid of supports_partial
completely, and instead set state.allow_partial = false
on the few validators which don't support it yet.
please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now, I think this is the right default and makes it explicit where we have decided not to support yet 👍
Co-authored-by: David Hewitt <[email protected]>
See pydantic/pydantic#10748 for details.
Selected Reviewer: @sydney-runkle